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To the Editor 

The ability to translate large-scale genetics and genomics data into biological knowledge has 
not kept pace with our ability to generate these data sets. As a consequence, a major 
bottleneck in biomedical research has become access to data within a computational 
workspace that allows for robust, collaborative analyses. One innovative solution is to bring 
together scientific data, code, tools and disease models into an open commons or workspace, 
for example, the Synapse platform of Sage Bionetworks 1 . This environment allows for real- 
time sharing of large genomic data sets, continuous peer review and rapid learning within a 
system constructed to provide data access in a manner aligned with the informed consent 
provided by patients and research participants. 

This crowdsourcing approach has been used to predict breast cancer survival from clinical 
and omics data 2 and was suggested as a way to find new drugs 3 by soliciting contributions 
from a large online community collaborating or competing to answer an inherently difficult 
but important question 4 . Researchers initiating an open challenge invite solutions but also 
incentivize the process by offering new data, a process in which the participants' methods 
can be assessed by testing their predictions against previously unseen data sets. This year, 
Sage and DREAM (Dialogue for Reverse Engineering Assessments and Methods) are 
running four open challenges (http://www.sagebase.org/challenges-overview/2013-dream- 
challenges/). 

Here we announce the challenge to develop genetic predictors of response to 
immunosuppressive therapy in a common autoimmune disease, rheumatoid arthritis (RA). 
Disease-modifying antirheumatic drugs such as those that block the inflammatory cytokine 
tumor necrosis factor-a (known as anti-TNF therapy) are not effective in all patients with 
RA, with up to one-third of such patients failing to enter clinical remission after a standard 
course of therapy 5 . Moreover, the biological mechanisms underlying this failure are 
unknown, limiting the development of clinical biomarkers to guide either this therapy or the 
development of new drugs to target refractory cases. 

The Rheumatoid Arthritis Responder Challenge is for teams to build the best genetic 
predictor of response to anti-TNF therapy. There are two phases to the challenge: discovery 
and validation (Fig. 1). In the discovery phase, teams will utilize genomic data sets — several 
of which will be generated for the purposes of this challenge — and a variety of analytical 
methods to build predictive polygenic models of treatment response. We recently published 
a genome-wide association study (GWAS) in -2,700 patients with RA treated with anti-TNF 
therapy 6 . Our GWAS data indicate that the genetic architecture of the anti-TNF response is 
probably highly polygenic, similar to what has been observed for other complex traits, such 
as risk of RA 7 . Importantly, our challenge will incorporate a new GWAS data set, which 
will be used in the validation phase, in which models built in the discovery phase are tested. 
The data set of -1,100 patients with RA treated with anti-TNF therapy will be made 
available though a public -private partnership between the Consortium of Rheumatology 
Researchers of North America, Inc. (CORRONA) and the Pharmacogenomics Research 
Network (PGRN) sponsored by the National Institute of General Medical Sciences 
(NIGMS) and the US National Institutes of Health (NIH). 
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A unique component of our Rheumatoid Arthritis Responder Challenge is the diversity of 
participation across a number of groups from academic institutions, private foundations and 
for-profit companies. In addition to support from CORRONA and PGRN, we received 
funding from pharmaceutical companies (see complete list on our website; link below) and a 
private foundation (the Arthritis Foundation) to support the public commons. We also 
received support from the Arthritis Internet Registry (AIR) and the Broad Institute to 
generate new genomic data sets, as well as in-kind support from a large number of academic 
collaborators from across the world to make GWAS data available in the discovery phase. 
We anticipate that a winning classifier could enable a follow-on prospective clinical trial 
within the group of appropriately consented patients in AIR. 

Through Synapse, analysts who are inclined to establish collaborations will have the 
opportunity to see in real time the models that others are using so that each team can learn 
from the others (Fig. 1). A leaderboard will show the relative performance ranking of the 
different teams on the basis of a cross validation strategy designed to minimize overfitting. 
During the discovery phase, teams that choose to collaborate with each other will have the 
opportunity to check each other's algorithms for readability, speed and reproducibility. 
Then, during the validation phase, each team will submit computer code, which the Sage- 
DREAM team (http://www.sagebase.org/) will test in Synapse to establish whether it runs as 
expected to predict if a subject is an anti-TNF therapy responder or nonresponder on the 
basis of the GWAS data. Predefined performance metrics will be used to objectively 
determine the accuracy of the predictions, their statistical significance and the final 
performance ranking of the participating teams. The team that develops the most highly 
predictive model will be deemed the 'winner', with precise attribution of contributor roles 
going to all members of teams that contributed to building the final consensus model. 

The best-performing models, therefore, will have passed a test of performance that is outside 
the realm of, and complements, traditional peer review. Indeed, this stringent test of method 
performance can be used as an enhanced way of publication vetting, what we call 
'challenge-assisted peer review'. Traditional peer review is essential for ensuring the clarity, 
originality, contextualization and logical thread of a discrete set of work that is ready to be 
used by researchers in the form of a published article. However, the complexity of working 
with omics data — entailing multiple analytical decisions, computational simulations and 
statistical calculations — means that referees are challenged to follow and check the 
components of even a traditional research paper. In our Rheumatoid Arthritis Responder 
Challenge, we will explore the feasibility of enhancing the reliability and transparency of 
conventional peer review in partnership with Nature Genetics. This can be achieved if the 
referees and authors of the paper reporting on the best-performing methods in the challenge 
are willing to leave their comments openly (yet anonymously) on the Synapse platform (Fig. 
1). We anticipate that the challenge-based assessment of accuracy will provide an objective 
metric of performance and a comparison with state-of-the-art analytical methodologies that 
will greatly enhance the task of refereeing a body of work with more quality control than is 
currently provided by conventional peer review. 

In conclusion, we believe that the Rheumatoid Arthritis Responder Challenge is an apt use 
of crowdsourcing in human genetics to gain insight into clinical prediction and disease 
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biology. Details of the challenge, including the rules by which the models will be judged, 
can be found at https://synapse.prod.sagebase.Org/#ISynapse:synl734172. 
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Figure 1. 

Overview of the Rheumatoid Arthritis Responder Challenge. There are two phases to the 
challenge. In phase 1 (discovery), analysts build genetic models of response to anti-TNF 
therapy using SNP data from a GWAS of -2,700 patients with RA. To facilitate model 
building, additional genomic data will be made available. In a model of open collaboration, 
participants will use Synapse to post code, share insights and engage in rapid learning 
prepublication. In phase 2 (validation), models will be posted, tested and scored in an 
independent GWAS data set of -1,100 patients with RA treated with anti-TNF therapy. To 
complement challenge-assisted peer review (which occurs in both the discovery and 
validation phases), conventional peer review will have access to Synapse to understand the 
iterative process of model building. Synapse will allow study investigators to respond to 
peer-review critiques and resubmit versions of their models and studies. 
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